可靠的概括是安全ML和AI的核心。但是,了解神经网络何时以及如何推广仍然是该领域最重要的未解决问题之一。在这项工作中,我们进行了一项广泛的实证研究(2200个模型,16个任务),以研究计算理论中的见解是否可以预测实践中神经网络概括的局限性。我们证明,根据Chomsky层次结构进行分组任务使我们能够预测某些架构是否能够推广到分布外输入。这包括负面结果,即使大量数据和训练时间也不会导致任何非平凡的概括,尽管模型具有足够的能力完美地适合培训数据。我们的结果表明,对于我们的任务子集,RNN和变形金刚无法概括非规范的任务,LSTMS可以解决常规和反语言任务,并且只有通过结构化内存(例如堆栈或存储器磁带)可以增强的网络可以成功地概括了无上下文和上下文敏感的任务。
translated by 谷歌翻译
诸如最大熵正则化之类的政策正则化方法被广泛用于增强学习以提高学习政策的鲁棒性。在本文中,我们展示了这种鲁棒性是如何通过对冲的奖励功能扰动而产生的,奖励功能是从想象中的对手设定的限制设置中选择的。使用凸双重性,我们表征了KL和Alpha-Divergence正则化的一组强大的对抗奖励扰动集,其中包括香农和Tsallis熵正则定期为特殊情况。重要的是,可以在此强大集合中给出概括保证。我们提供了有关最坏的奖励扰动的详细讨论,并提供了直观的经验示例,以说明这种稳健性及其与概括的关系。最后,我们讨论我们的分析如何补充并扩展对对抗奖励鲁棒性和路径一致性最佳条件的先前结果。
translated by 谷歌翻译
我们扩展了时间差异(TD)学习,以获得风险敏感的无模型加强学习算法。该扩展可以被视为Rescorla-Wagner规则的修改,其中(六样)刺激被认为是过度或低估TD目标的事件。结果,获得从I.I.D的自由能量的随机近似规则。通过高斯分布产生的样本,具有未知的平均值和方差。由于已知高斯自由能量是对平均值和方差的确定性相当敏感,因此学习规则具有风险敏感决策的应用。
translated by 谷歌翻译
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in singlemachine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach. The source code is publicly available at github.com/deepmind/scalable agent.
translated by 谷歌翻译
Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction. In this paper, we present our work on integrating multiple data sources in SmokeyNet, a deep learning model using spatio-temporal information to detect smoke from wildland fires. Camera image data is integrated with weather sensor measurements and processed by SmokeyNet to create a multimodal wildland fire smoke detection system. We present our results comparing performance in terms of both accuracy and time-to-detection for multimodal data vs. a single data source. With a time-to-detection of only a few minutes, SmokeyNet can serve as an automated early notification system, providing a useful tool in the fight against destructive wildfires.
translated by 谷歌翻译
There is significant interest in deploying machine learning algorithms for diagnostic radiology, as modern learning techniques have made it possible to detect abnormalities in medical images within minutes. While machine-assisted diagnoses cannot yet reliably replace human reviews of images by a radiologist, they could inform prioritization rules for determining the order by which to review patient cases so that patients with time-sensitive conditions could benefit from early intervention. We study this scenario by formulating it as a learning-augmented online scheduling problem. We are given information about each arriving patient's urgency level in advance, but these predictions are inevitably error-prone. In this formulation, we face the challenges of decision making under imperfect information, and of responding dynamically to prediction error as we observe better data in real-time. We propose a simple online policy and show that this policy is in fact the best possible in certain stylized settings. We also demonstrate that our policy achieves the two desiderata of online algorithms with predictions: consistency (performance improvement with prediction accuracy) and robustness (protection against the worst case). We complement our theoretical findings with empirical evaluations of the policy under settings that more accurately reflect clinical scenarios in the real world.
translated by 谷歌翻译
Deepfakes are computationally-created entities that falsely represent reality. They can take image, video, and audio modalities, and pose a threat to many areas of systems and societies, comprising a topic of interest to various aspects of cybersecurity and cybersafety. In 2020 a workshop consulting AI experts from academia, policing, government, the private sector, and state security agencies ranked deepfakes as the most serious AI threat. These experts noted that since fake material can propagate through many uncontrolled routes, changes in citizen behaviour may be the only effective defence. This study aims to assess human ability to identify image deepfakes of human faces (StyleGAN2:FFHQ) from nondeepfake images (FFHQ), and to assess the effectiveness of simple interventions intended to improve detection accuracy. Using an online survey, 280 participants were randomly allocated to one of four groups: a control group, and 3 assistance interventions. Each participant was shown a sequence of 20 images randomly selected from a pool of 50 deepfake and 50 real images of human faces. Participants were asked if each image was AI-generated or not, to report their confidence, and to describe the reasoning behind each response. Overall detection accuracy was only just above chance and none of the interventions significantly improved this. Participants' confidence in their answers was high and unrelated to accuracy. Assessing the results on a per-image basis reveals participants consistently found certain images harder to label correctly, but reported similarly high confidence regardless of the image. Thus, although participant accuracy was 62% overall, this accuracy across images ranged quite evenly between 85% and 30%, with an accuracy of below 50% for one in every five images. We interpret the findings as suggesting that there is a need for an urgent call to action to address this threat.
translated by 谷歌翻译
We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across the network, for implementation. In the DARPA OFFSET program field setting, we test this interface design for the operator using the STOMP interface -- the same interface used by Raytheon BBN throughout the duration of the OFFSET program. In these tests, we demonstrate that our approach is scale-invariant -- the user specification does not depend on the number of agents; it is persistent -- the specification remains active until the user specifies a new command; and it is real-time -- the user can interact with and interrupt the swarm at any time. Moreover, we show that the spectral/ergodic specification of swarm behavior degrades gracefully as the number of agents goes down, enabling the operator to maintain the same approach as agents become disabled or are added to the network. We demonstrate the scale-invariance and dynamic response of our system in a field relevant simulator on a variety of tactical scenarios with up to 50 agents. We also demonstrate the dynamic response of our system in the field with a smaller team of agents. Lastly, we make the code for our system available.
translated by 谷歌翻译
While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative object manipulation using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a gradient-based soft-body physics simulator into an attention-based neural network, our multi-robot manipulation system can achieve better performance than baselines. In addition, our system also generalizes to unseen configurations during training and is able to adapt toward task completions when external turbulence and environmental changes are applied.
translated by 谷歌翻译
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
translated by 谷歌翻译